Investigating measurement invariance of the Emotion Regulation Questionnaire-8 (ERQ-8) across 29 countries

The widely used Emotion Regulation Questionnaire (ERQ) measures the habitual use of cognitive reappraisal and expressive suppression. Recently, a more economical 8-item version of the ERQ was proposed that showed good model fit. We assessed whether the latent constructs of the ERQ-8 are generalizable across different countries and cultures. To this end, we used data from the COVIDiSTRESS survey and investigated measurement invariance of the ERQ-8 in a large sample that included 11,288 individuals from 29 countries with diverse cultural backgrounds. Our analyses revealed configural and metric invariance of the ERQ-8 in 14 countries. The results suggest that emotion regulation strategies may not readily converge across all cultures. This underscores the importance of testing measurement invariance before interpreting observed differences and similarities between countries. Supplementary information The online version contains supplementary material available at 10.1007/s12144-022-04220-6.

Emotion regulation refers to our ability to modify feelings and behavior (Gross, 1998). This includes the use of different emotion regulation strategies to either downregulate negative, or upregulate positive emotions (McRae & Gross, 2020). For instance, one might try to distract oneself to calm down when angry, or share great news with loved ones to prolong positive feelings (Gross, 2014). Numerous studies have shown that effective emotion regulation is crucial for well-being in everyday life (e.g., Kobylińska et al., 2022), and that emotion dysregulation is found in many disorders, such as depression (Hui et al., 2021), hypochondriasis (Bailer et al., 2017), and borderline personality disorder (Chapman, 2019).
Two extensively investigated emotion regulation strategies are cognitive reappraisal and expressive suppression (McRae & Gross, 2020). The former refers to the cognitive reinterpretation of an emotional event, thereby altering its emotional impact (e.g., telling oneself that this is just a movie and no one got hurt; McRae & Gross, 2020).
According to James Gross' Process Model of Emotion Regulation, this strategy is considered antecedent-focused, meaning that it occurs before the emotional response has fully developed (Gross & John, 2003). In contrast, suppression refers to a response-oriented strategy that involves inhibiting the associated behavioral expression of an already completely developed emotion (e.g., keeping a neutral face even though one is sad; Gross & John, 2003;McRae & Gross, 2020). While both strategies have been shown to be effective in the short-term (Germain & Kangas, 2015), studies on long-term effects suggest that the habitual use of suppression is associated with poor psychological outcomes (Low et al., 2021), whereas reappraisal is positively associated with physical health (Appleton et al., 2013), academic achievement (Ivcevic & Brackett, 2014), and resilience (Kuhlman et al., 2021). Interestingly, there is little evidence of cultural differences in reappraisal, but strong support that the negative effects of habitual suppression are culture-specific (Ramzan & Amjad, 2017;Schunk et al., 2022). Due to differences in norms and values, people in collectivistic cultures are more likely to suppress emotions to maintain social harmony than people in individualistic cultures (Matsumoto et al., 2008). Expressive suppression is therefore more common and acceptable among collectivist-oriented individuals and consequently associated with fewer or no adverse outcomes (Ramzan & Amjad, 2017). This cross-cultural variation raises questions about the comparability of emotion regulation assessment tools across many different countries.
One tool commonly used to assess individual differences in the dispositional use of emotion regulation strategies is the Emotion Regulation Questionnaire (ERQ; Gross & John, 2003). The ERQ comprises 10 items, six of which measure cognitive reappraisal and four of which measure expressive suppression. To date, the ERQ has been translated into 37 different languages (Stanford University, 2022) and has been validated across a wide range of countries, with the original 10-item two-factor structure generally showing good model fit (e.g., Balzarotti et al., 2010;D'Argembeau & Van der Linden, 2006;Matsumoto et al., 2008;Melka et al., 2011). However, these validation studies have been criticized in the past for relying exclusively on student samples (e.g., Spaapen et al., 2014). Some of the studies evaluating the ERQ in non-student samples failed to replicate the same factor structure and ended up with a 9-item (e.g., Spaapen et al., 2014) or 8-item (e.g., Balzarotti, 2021) solution. Such discrepancies in the factor structure of the ERQ may be due to differences across samples or cultures and underscore the importance of testing measurement invariance (Boer et al., 2018;Gong et al., 2021). Measurement invariance indicates whether an instrument measures the same underlying latent construct across different groups (e.g., countries) and is therefore a prerequisite for a meaningful interpretation of observed differences and similarities between these groups (Milfont & Fischer, 2010). However, few cross-cultural comparative studies examine measurement invariance before drawing conclusions from their data (Boer et al., 2018).

The present study
The ERQ is a questionnaire used worldwide to assess the habitual use of emotion regulation strategies. Recently, the feasibility of a more economical 8-item version has been demonstrated (Balzarotti, 2021). This makes it a promising candidate for large cross-cultural studies on emotion regulation. Here, we investigate measurement invariance of the ERQ-8 in a large and diverse sample of individuals from 29 different countries to explore the validity of the 2-factor solution, and to provide norm values across different nations.

Sample
Data for this study was obtained from the global COVI-DiSTRESS survey, which collected data on psychological and behavioral outcomes one year into the COVID-19 pandemic (Blackburn et al., 2022). The survey was conducted in 137 different countries during the summer of 2021 and yielded responses from a total of 15,740 participants, with a range of 1 to 2260 participants per country. To the best of our knowledge, there are no clear recommendations for minimum sample sizes in confirmatory factor analysis (CFA) studies (Wolf et al., 2013). Thus, we decided to include those countries with at least 100 complete responses to maximize cultural variance while maintaining adequate power for our analyses (Boer et al., 2018;Ruggeri et al., 2022). This resulted in a total sample of 11,288 individuals from 29 countries. The demographic characteristics can be found in Table 1 (for demographic information on the original COVIDiSTRESS survey sample, see Blackburn et al., 2022).

Materials
The ERQ-8 does not include Item 1 ("When I want to feel more positive emotion (such as joy or amusement), I change what I'm thinking about."; reappraisal) and Item 3 ("When I want to feel less negative emotion (such as sadness or anger), I change what I'm thinking about."; reappraisal) of the ERQ-10. Participants were asked to rate each statement on a Likert scale from 1 ("strongly disagree") to 7 ("strongly agree"). All items were administered in the official language of the respective country 1 .

Procedure
Official translations of the ERQ were used in all countries. Further details on the entire COVIDiSTRESS study procedure, including inclusion criteria and data post-processing, can be found in Blackburn et al. (2022). For the purposes of our study, we obtained the open-access dataset from the Open Science Framework (https:// osf. io/ 36tsd/; Blackburn et al., 2022). Participants who did not complete the ERQ-8 or had items missing were removed (n = 2842). Following listwise deletion, 29 countries with at least 100 participants were included.

Analysis
Data was analyzed using CFA with maximum likelihood estimation in R with the lavaan package (Rosseel, 2012). However, because our data were not multivariate normal, we also performed our analyses using robust maximum likelihood estimation. This did not change the results, and is reported in the supplementary material (see Supplementary Tables 1 and 2). Our model allowed correlation of the factors reappraisal (items 5, 7, 8, and 10) and suppression (items 2, 4, 6, and 9). We followed the customary forward-step approach to measurement invariance (Sass, 2011), by (1) estimating our model in each country separately, and subsequently testing (2) a configural invariance model (i.e., factor structure held equal), (3) a metric invariance model (i.e., factor structure and factor loadings held equal), and (4) a scalar invariance model (i.e., factor structure, factor loadings, and intercepts held equal).

Confirmatory factor analysis
Model fit was acceptable in 14 of the 29 countries (Table 2). Norm values are presented in Supplementary Tables 3 and 4 for the 14 countries in which the expected factor structure of the ERQ-8 was met. Factor correlations and internal consistencies are reported in Supplementary Table 5.

Multi group confirmatory factor analysis and measurement invariance
We performed stepwise measurement invariance analyses with both the full sample (N = 29) and the subsample of countries that showed acceptable fit in the CFA (N = 14).
When analyzing the smaller sample, we found acceptable fit indices for the configural invariance model as well as acceptable fit differences for the metric invariance model, but not when analyzing the full sample (Table 3).

Discussion
We assessed the suitability of the ERQ-8 for cross-cultural comparisons, by investigating its measurement invariance, and provided norm values for 14 countries. Although measurement invariance is a prerequisite for meaningful group comparisons (Milfont & Fischer, 2010), only a fraction of cross-cultural research establishes it before generalizing their findings across different samples (Boer et al., 2018). Our results emphasize the importance of such an approach, as the original factor structure of the ERQ-8 could only be replicated in 14 of the 29 countries (i.e., Brazil, Bulgaria, Colombia, Costa-Rica, Germany, Ireland, Malaysia, Norway, Portugal, Slovakia, Spain, Switzerland, Turkey, and the UK). Moreover, while metric invariance was achieved in the same group of countries, allowing comparisons of correlation coefficients; scalar invariance was not given, i.e., mean comparisons are not adequate.
In view of these findings, the question arises as to why measurement invariance was not observed in all 29 countries. We suggest two possible explanations. The original ERQ might have performed better than the shorter ERQ-8 because this more parsimonious version may be flawed by the absence of two items. However, we consider this unlikely, as previous research has also demonstrated poor fit of the original ERQ in non-student samples (Balzarotti, 2021;Spaapen et al., 2014). Another reason may be that the latent constructs measured by the ERQ-8 cannot be readily generalized across all countries and cultures. The ERQ was developed by Gross and John (2003) on a student sample from a so-called Western, Educated, Industrialized, Rich, and Democratic (WEIRD) country. It is commonly recognized that people from WEIRD nations differ fundamentally from people from non-WEIRD nations in many psychological processes (Cheon et al., 2020). This is not to say that the ERQ/ERQ-8 should not be used in non-WEIRD countries, but rather that comparisons across countries without measurement invariance should be made with caution, as they may be measuring conceptually different constructs (Boer et al., 2018). Future research should aim to shed more light on this issue and strive to include more samples from non-WEIRD countries, particularly since differences in the use of suppression and reappraisal have already been linked to different cultural values (Matsumoto et al., 2008;Ramzan & Amjad, 2017;Schunk et al., 2022).
In contrast to Balzarotti (2021), we did not find a good model fit of the ERQ-8 among Italians. However, a closer inspection of the sample composition shows that our sample, although similar in age, includes more female participants (52% vs. 73.3%). This may explain the differences in model fit, but would also suggest that the latent constructs measured by the ERQ-8 vary between men and women. However, this can be doubted, because studies from different cultures have found acceptable measurement invariance of the ERQ across genders (Melka et al., 2011;Zhang & Bian, 2020). Further studies are needed to identify causes of measurement non-invariance that go beyond cultural factors.

Limitations
Despite the strengths of our study, such as the large number of participants from 29 countries with diverse cultural backgrounds, our results must be interpreted with some limitations in mind. First, the samples might not be representative of their respective country's true population. Although the number of students is rather small, the overall educational level is high. That is, people of lower socioeconomic status are likely not fully represented in this study. This must also be taken into account when using the norm values from our supplementary materials. Second, the COVIDiSTRESS survey was carried out in the context of the COVID-19 pandemic. This may have raised participants' stress levels and influenced their responses, possibly resulting in fewer functional emotion regulation strategies being reported. Third, the German translation of the ERQ-8 (which was used in Germany and Switzerland) lacked the negation of item 9. Although this was easily resolved in our analyses (by reverse coding the responses), from a psychometric point of view this item no longer corresponds to the original item (Schriesheim et al., 1991). Fourth, the internal consistency of both subscales was poor in some countries (α < 0.70), which may have possibly influenced our model fit despite being due to reliability issues and not due to actual differences in the latent constructs measured. Finally, based on our data, we cannot draw firm conclusions about the reasons for the differences in measurement invariance across countries. We have discussed potential explanations, but these remain to be explored in future studies.

Conclusion
The ERQ-8 is an economical instrument for assessing the habitual use of suppression and reappraisal. It shows configural and metric invariance across 14 different countries, allowing cross-cultural comparisons and generalizability of correlation coefficients. However, the lack of measurement invariance in 15 additional countries suggests possible differences in latent constructs measured by the ERQ-8. This highlights the need to test measurement invariance in future cross-cultural studies.